164 research outputs found

    Agents and stream data mining: a new perspective

    Full text link
    Many organizations struggle with the massive amount of data they collect. Today, data does more than serve as the ingredients for churning out statistical reports. They help support efficient operations in many organizations, and to some extent, data provide the competitive intelligence organizations need to survive in today\u27s economy. Data mining can\u27t always deliver timely and relevant results because data are constantly changing. However, stream-data processing might be more effective, judging by the Matrix project.<br /

    Reducing Cognitive Overheads in a Web Warehouse using Reverse-Osmosis

    Get PDF
    This paper provides a quantitative analysis of reducing cognitive overheads in a Web warehouse using an important class of operation called reverse osmosis. The analysis is used to examine two different cognitive overheads of locating relevant nodes or information and display time of a Web table. A reverse-osmosis operation enables us to eliminate in relevant information from a collection of Web documents stored in the form of a Web table. We call such an operation reverse-osmosis because it is analogous to the reverse osmosis process in the field of water purification. We discuss a formal algorithm of the reverse-osmosis operatio

    Cost-benefit Analysis of Web Bag in a Web Warehouse

    Get PDF
    Sets and bags are closely related structures and have been studied in relational databases. A bag is different from a set in that it is sensitive to the number of times an element occurs, while a set is not. In this paper, we introduce the concept of a Web bag in the context of a World Wide Web warehouse called WHOWEDA (WareHouse Of WEb DAta) which we are currently building. Informally, a Web bag is a Web table which allows multiple occurrences of identical Web types. A Web bag helps one to discover useful knowledge from a Web table, such as visible documents or Web sites (i.e. documents/sites which can be reached by many paths), luminous documents (i.e. documents with many outgoing links) and luminous paths (i.e. frequently traversed paths). In this paper, we provide a cost-benefit analysis of materializing Web bags as compared to Web tables with distinct Web tuple

    Scheduling queries to improve the freshness of a website

    Get PDF
    The WWW is a new advertising media in recent years where corporations utilize it to increase their exposure to consumers. For a very large website whose content is derived from some source database, it is important to maintain its freshness in response to changes to the base data. This issue is particularly signicant for websites presenting fast changing information such as stock exchange information and product information. In this paper, we formally dene and study the freshness of a website that is refreshed by scheduling a set of queries to fetch fresh data from the databases. Then, we propose several online scheduling algorithms and compare the performance of the algorithms on the freshness metric. Our conclusion is veried by empirical results. Keywords: Internet Data Management, View Maintenance, Query Optimization, Hard Real-Time Scheduling 1 Introduction The popularity of the World-Wide Web (WWW) has made it a prime vehicle for disseminating information. More and ..

    Product Schema Integration for Electronic Commerce: A synonym comparison approach

    Get PDF

    ViDE: A Visual Data Extraction Environment for the Web

    Get PDF
    • …
    corecore